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(54) JAPANESE/ENGLISH DISCRIMINATING METHOD FOR DOCUMENT IMAGE, 
DOCUMENT RECOGNIZING METHOD AND RECORDING MEDIUM 

(57)Abstract: 

PROBLEM TO BE SOLVED: To accurately identify 
Japanese and English at high speed and to identify both 
the language even concerning a range to be identified 
for each character area and each page unit. 
SOLUTION: After an input document image 101 is 
reduced 102, a black pixel connection component is 
extracted 103, and the character area is generated 104 
by merging these components. Concerning the 
generated character area, based on the length of the 
connection component, a Japanese/English 
discriminating means 104 classifies that component and 
based on the accumulated value of the classified result, 
whether it is a Japanese area or an English area is 
discriminated. 
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JPO and NCI PI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] The Japanese English judging approach of the document image characterized by obtaining the 
last judging result by being the Japanese English judging approach of a document image of judging 
whether each alphabetic character field in a document image being a Japanese field, or it being an 
English field, judging whether it is a Japanese field or it is an English field using two or more judgment 
approaches, and comparing these two or more judgment results. 

[Claim 2] It is the Japanese English judging approach of a document image of judging whether each 
alphabetic character field in a document imagie being a Japanese field, or it being an English field. The 
Japanese English judging approach of the document image characterized by classifying this connected 
component based on the die length of the black pixel connected component in the alphabetic character 
field generated by reducing said document image, and judging whether said each alphabetic character 
field is a Japanese field or it is an English field based on the summary value of this classification result. 
[Claim 3] The Japanese English judging approach of the document image according to claim 2 
characterized by using the different judgment approach when the number of the black pixel connected 
components in said alphabetic character field generated does not fulfill predetermined conditions. 
[Claim 4] It is the Japanese English judging approach of a document image of judging whether the 
document image of each page being a Japanese document image, or it being an English document 
image. The Japanese English judging approach of the document image characterized by classifying this 
connected component based on the die length of the black pixel connected component in the page 
generated by reducing said document image, and judging whether said each page is a Japanese field or it 
is an English field based on the summary value of this classification result. 

[Claim 5] It is the Japanese English judging approach of a document image of a page consisting of two 
or more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image. This connected component is 
classified based on the die length of the black pixel connected component in the alphabetic character 
field generated by reducing said document image. The Japanese English judging approach of the 
document image characterized by judging whether said each alphabetic character field is a Japanese 
field or it is an English field based on the summary value of this classification result, and judging 
whether said each page is a Japanese field or it is an English field based on this judgment result. 
[Claim 6] It is the Japanese English judging approach of a document image of judging whether each 
alphabetic character field in a document image being a Japanese field, or it being an English field. 
Detect a line out of said alphabetic character field, unify the circumscription rectangle which approached 
out of this line, and a block is extracted. It judges whether it is a Japanese field, it is an English field, or 
it is a judgment impossible field for this every block. The Japanese English judging approach of the 
document image characterized by totaling this judgment result for said every block, and judging whether 
said each alphabetic character field is a Japanese field or it is an English field based on this summary 
value. 

[Claim 7] The Japanese English judging approach of the document image according to claim 6 
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characterized by using the different judgment approach when the number of said blocks extracted does 
not fulfill predetermined conditions. 

[Claim 8] It is the Japanese English judging approach of a document image of a page consisting of two 
or more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image. Detect a line out of said alphabetic 
character field, unify the circumscription rectangle which approached out of this line, and a block is 
extracted. It judges whether it is a Japanese field, it is an English field, or it is a judgment impossible 
field for this every block. The Japanese English judging approach of the document image characterized 
by totaling this judgment result per page and judging whether said each page is a Japanese document 
image or it is an English document image based on this summary value. 

[Claim 9] It is the Japanese English judging approach of a document image of a page consisting of two 
or more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image. Detect a line out of said alphabetic 
character field, unify the circumscription rectangle which approached out of this line, and a block is 
extracted. It judges whether it is a Japanese field, it is an English field, or it is a judgment impossible 
field for this every block. Total this judgment result for every alphabetic character field, and it judges 
whether it is a Japanese field or it is an English field for every alphabetic character field based on this 
summary value. The Japanese English judging approach of the document image characterized by 
totaling this judgment result per page and judging whether said each page is a Japanese document image 
or it is an English document image based on this summary value. 

[Claim 10] The document recognition approach characterized by judging whether a document image is a 
Japanese document image or it is an English document image, and performing document recognition 
processing according to this judgment result. 

[Claim 1 1] The document recognition approach characterized by dividing a document image into two or 
more alphabetic character fields, judging whether it is a Japanese document field or it is an English 
document field for every this divided alphabetic character field, and performing document recognition 
processing according to this judgment result. 

[Claim 12] The record medium which recorded the program for making a computer realize the fimction 
to use two or more judgment approaches and to judge whether it is a Japanese field or it is an English 
field, and the function to obtain the last judging result by comparing these two or more judgment results 
in order to judge whether each alphabetic character field in a document image is a Japanese field, or it is 
an English field and in which computer reading is possible. 

[Claim 13] In order to judge whether each alphabetic character field in a document image or the 
document image of each page is a Japanese field, or it is an English field The fimction to classify this 
connected component based on the die length of the black pixel connected component in the alphabetic 
character field generated by reducing said document image, or a page, The record medium which 
recorded the program for making a computer realize the function to judge whether said each alphabetic 
character field or each page is a Japanese field or it is an English field based on the summary value of 
this classification result and in which computer reading is possible. 

[Claim 14] In order to judge whether each alphabetic character field in a document image is a Japanese 
field, or it is an English field Or in order for a page to consist of two or more alphabetic character fields 
and to judge whether the docxmient image of each page is a Japanese document image, or it is an English 
document image The function to detect a line out of said alphabetic character field, and the function to 
unify the circumscription rectangle which approached out of this line, and to extract a block, The 
function to judge whether it is a Japanese field, it is an English field, or it is a judgment impossible field 
for this every block. This judgment result the function which totals in said every block and a page unit, 
and based on this summary value The record medium which recorded the program for making a 
computer realize the function to judge whether the function to judge whether said each alphabetic 
character field is a Japanese field or it is an English field, or each page is a Japanese document image, or 
it is an English document image and in which computer reading is possible. 

[Claim 15] The record medium which recorded the program for making a computer realize the function 
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of dividing into two or more alphabetic character fields the function judge whether a document image is 
a Japanese document image or it is an English document image, or a document image, and judging 
whether it being a Japanese document field or it being an EngUsh document field for every this divided 
alphabetic character field, and the function of performing the document recognition processing 
according to this judgment result and in which computer reading is possible. 



[Translation done.] 
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* NOTICES * 

JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the document recognition approach and record medium 
which carry out recognition processing, after judging whether a document image is a Japanese document 
image or it is an English document image to each alphabetic character field in a document image, 
concerning the Japanese English judging approach and record medium of a document image which 
judge whether it is a Japanese field and whether it is an English field. 
[0002] 

[Description of the Prior Art] When performing character recognition processing to a document image, 
it is necessary to choose suitable language. That is, if Japanese is recognized in English OCR, and it 
cannot recognize as a way other than the alphabet or a figure and is going to recognize English in 
Japanese OCR conversely, a recognition rate will become low rather than the case where English OCR 
is used on alphabetic character logging or language processing. 

[0003] Therefore, before performing character recognition processing, it will be necessary to perform 
language discernment. The various technique of discriminating the alphabetic character kind in a 
document fi-om the former is proposed. For example, counting of the count of monochrome reversal of 
the lengthwise direction of the character row made binary or a longitudinal direction is carried out, and 
there is document recognition equipment which identifies an alphabetic character kind based on the 
distribution (see JP,5-108876,A). 

[0004] Moreover, the read word is made to recognize and there is also document recognition equipment 
which distinguishes the language class of recognition alphabetic character based on the relevance ratio 
of the recognition result and dictionary (see JP,6- 150061, A). 
[0005] 

[Problem(s) to be Solved by the Invention] Although the count of monochrome reversal is used with the 
above-mentioned former equipment as a description which identifies an alphabetic character kind, the 
problem that fluctuation by the font or the content of a document (ratios, such as kana, a kanji, and a 
figure) is large, for this reason the precision of discernment becomes low has this description. 
[0006] On the other hand, with the latter equipment, since character recognition is performed once, if the 
engine performance of OCR avoids, a type of letters will become clear by the remarkable probability, 
and it becomes possible to perform Japanese-English distinction with a sufficient precision. However, 
OCR has the problem that processing takes much time amount. 

[0007] This invention is what was made in consideration of the above-mentioned situation. The object 
of this invention While performing discernment of Japanese and English with a sufficient precision at a 
high speed, also about the range to identify for every alphabetic character field Moreover, it is in 
offering the Japanese English distinction approach and the record medium, the document recognition 
approach of judging a document image further and performing optimal document recognition 
processing, and record medium of the document image which can identify both for every page unit. 
[0008] 
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[Means for Solving the Problem] It is characterized by obtaining the last judging result by being the 
Japanese English judging approach of a document image of judging whether each alphabetic character 
field in a document image being a Japanese field, or it being an English field, judging whether it is a 
Japanese field or it is an English field using two or more judgment approaches, and comparing these two 
or more judgment results by invention according to claim 1, in order to attain said object. 
[0009] It is the Japanese English judging approach of a document image of judging whether each 
alphabetic character field in a document image being a Japanese field, or it being an English field in 
invention according to claim 2. This connected component is classified based on the die length of the 
black pixel connected component in the alphabetic character field generated by reducing said document 
image, and it is characterized by judging whether said each alphabetic character field is a Japanese field 
or it is an English field based on the summary value of this classification result. 
[0010] In invention according to claim 3, when the number of the black pixel connected components in 
said alphabetic character field generated does not fulfill predetermined conditions, it is characterized by 
using the different judgment approach. 

[001 1] It is the Japanese English judging approach of a document image of judging whether the 
document image of each page being a Japanese document image, or it being an English document image 
in invention according to claim 4. This connected component is classified based on the die length of the 
black pixel connected component in the page generated by reducing said document image, and it is 
characterized by judging whether said each page is a Japanese field or it is an English field based on the 
summary value of this classification result. 

[0012] It is the Japanese English judging approach of a document image of a page consisting of two or 
more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image in invention according to claim 5. 
This connected component is classified based on the die length of the black pixel connected component 
in the alphabetic character field generated by reducing said document image. It is characterized by 
judging whether said each alphabetic character field is a Japanese field or it is an English field based on 
the summary value of this classification result, and judging whether said each page is a Japanese field or 
it is an English field based on this judgment result. 

[0013] It is the Japanese English judging approach of a document image of judging whether each 
alphabetic character field in a document image being a Japanese field, or it being an English field in 
invention according to claim 6. Detect a line out of said alphabetic character field, unify the 
circumscription rectangle which approached out of this line, and a block is extracted. It judges whether it 
is a Japanese field, it is an English field, or it is a judgment impossible field for this every block, this 
judgment result is totaled for said every block, and it is characterized by judging whether said each 
alphabetic character field is a Japanese field or it is an English field based on this summary value. 
[0014] In invention according to claim 7, when the number of said blocks extracted does not fulfill 
predetermined conditions, it is characterized by using the different judgment approach. 
[0015] It is the Japanese English judging approach of a document image of a page consisting of two or 
more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image in invention according to claim 8. 
Detect a line out of said alphabetic character field, unify the circumscription rectangle which approached 
out of this line, and a block is extracted. It judges whether it is a Japanese field, it is an English field, or 
it is a judgment impossible field for this every block, this judgment result is totaled per page, and it is 
characterized by judging whether said each page is a Japanese document image or it is an English 
document image based on this summary value. 

[0016] It is the Japanese English judging approach of a document image of a page consisting of two or 
more alphabetic character fields, and judging whether the document image of each page being a 
Japanese document image, or it being an English document image in invention according to claim 9. 
Detect a line out of said alphabetic character field, unify the circumscription rectangle which approached 
out of this line, and a block is extracted. It judges whether it is a Japanese field, it is an English field, or 
it is a judgment impossible field for this every block. This judgment result is totaled for every alphabetic 
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character field, it judges whether it is a Japanese field or it is an English field for every alphabetic 
character field based on this summary value, this judgment result is totaled per page, and it is 
characterized by judging whether said each page is a Japanese document image or it is an English 
document image based on this summary value. 

[0017] In invention according to claim 10, it judges whether a document image is a Japanese document 
image or it is an English document image, and is characterized by performing document recognition 
processing according to this judgment result. 

[0018] In invention according to claim 1 1, a document image is divided into two or more alphabetic 
character fields, and it judges whether it is a Japanese document field or it is an English document field 
for every this divided alphabetic character field, and is characterized by performing document 
recognition processing according to this judgment result. 

[0019] In order to judge whether each alphabetic character field in a document image is a Japanese field, 
or it is an English field in invention according to claim 12 The function to judge whether it is a Japanese 
field or it is an English field using two or more judgment approaches. It is characterized by being the 
record medium which recorded the program for making a computer realize the function to obtain the last 
judging result and in which computer reading is possible by comparing these two or more judgment 
results. 

[0020] In order to judge whether each alphabetic character field in a document image or the document 
image of each page is a Japanese field, or it is an English field in invention according to claim 13 The 
fimction to classify this connected component based on the die length of the black pixel connected 
component in the alphabetic character field generated by reducing said docimient image, or a page. It is 
characterized by being the record medium which recorded the program for making a computer realize 
the function to judge whether said each alphabetic character field or each page is a Japanese field or it is 
an English field based on the summary value of this classification result and in which computer reading 
is possible. 

[0021] In order to judge whether each alphabetic character field in a document image is a Japanese field, 
or it is an English field in invention according to claim 14 Or in order for a page to consist of two or 
more alphabetic character fields and to judge whether the document image of each page is a Japanese 
document image, or it is an English document image The function to detect a line out of said alphabetic 
character field, and the function to unify the circumscription rectangle which approached out of this line, 
and to extract a block, The function to judge whether it is a Japanese field, it is an English field, or it is a 
judgment impossible field for this every block, This judgment result the function which totals in said 
every block and a page unit, and based on this summary value Whether said each alphabetic character 
field is a Japanese field or it is an English field It is characterized by being the record medium which 
recorded the program for making a computer realize the function to judge whether the function to judge 
or each page is a Japanese document image or it is an English document image and in which computer 
reading is possible. 

[0022] The function to judge whether a document image is a Japanese docxmient image or it is an 
English document image in invention according to claim 15, or a document image is divided into two or 
more alphabetic character fields. It is characterized by being the record medium which recorded the 
program for making a computer realize the function to judge whether it is a Japanese document field or 
it is an English document field for every divided this alphabetic character field, and the function to 
perform document recognition processing according to this judgment result and in which computer 
reading is possible. 
[0023] 

[Embodiment of the Invention] Hereafter, one example of this invention is concretely explained using a 
drawing. 

<Example 1> Drawing 1 shows the configuration of the example 1 of this invention. In drawing, an 
image input means by which 101 inputs a document image, and 102 An image cutback means to reduce 
an input-statement paintings-and-calligraphic -works image, and 103 A connected component extract 
means to extract a connected component from a document image, and 104 A field generation means to 



http://www4.ipdLncipi.go.jp/cgi-bin/tran_web_cgi_ejje 6/1 6/05 



• 



JP,1 1-191 135,A [DETAILED DESCRIPTION] 



Page 4 of 1 1 



generate an alphabetic character field by classifying and unifying the extracted connected component, 
and 105 A Japanese-English distinction means to distinguish Japanese and English per an alphabetic 
character field unit or page, and 106 It is a data communication means for the data storage section which 
memorizes various data, such as document image data into which the control section which controls the 
whole, and 107 were inputted, and connected component data, field data, and 108 to mind a data 
communication way, and for 109 to mind a network, a circuit, etc., and to connect with a host etc. 
[0024] Drawing 2 shows the processing flow chart of the whole example 1 of this invention. Hereafter, 
processing actuation of this invention is explained, referring to drav^ng 2 . First, the image input means 
101 obtains a document image by reading a document (step 201). This image input means is a scanner, 
facsimile, etc., and you may make it obtain an image from another device via a network through the data 
communication means 109. 

[0025] Next, the image cutback means 102 reduces the inputted document image (step 202). This 
processing is processing which carries out OR cutback for example, of the input-statement paintings- 
and-calligraphic-works image about 1/8. That is, if 8x8 pixels is reduced to 1 pixel and there is at least 
one black pixel in 64 pixels, a cutback pixel will be processing made into a black pixel. 
[0026] The connected component extract means 103 extracts a black pixel connected component from a 
cutback image (step 203). The field generation means 104 classifies and unifies the extracted connected 
component, and generates an alphabetic character field (step 204), What is necessary is just to use the 
well-knovra approach indicated by JP,6-20092,A as this field generation method. At this time, the 
information on the connected component which constitutes each alphabetic character field is stored and 
held in the data storage section 107. 

[0027] Then, the Japanese-English distinction means 105 performs the judgment of Japanese or English 
about the generated alphabetic character field (step 205). 

[0028] In step 202, nearby black pixels unite an image by carrying out OR cutback. In English, a tooth 
space exists and the description of being very narrow is between words between the alphabetic 
characters in a word here. On the other hand, in Japanese, character spacing does not change a lot except 
before and after punctuation. 

[0029] Drawing 3 indicates the circumscription rectangle to be the example of an image of English and a 
Japanese sentence. It is the circumscription rectangle 302 which expressed the result of having reduced 
the English image 301 and having extracted the connected component, with the circumscription 
rectangle (in addition, since cutback processing is carried out, although the circumscription rectangle 
302 should become smaller originally than an image 301, it is expressed in the same size here). It unites 
for every word and a connected component consists of English images. 
[0030] If it reduces similarly, a connected component is extracted and it expresses with the 
circumscription rectangle about the example of the Japanese images 303 and 305, it will become like the 
circumscription rectangles 304 and 306, respectively. 

[0031] In the case of English, since the number of the alphabetic characters which constitute a word is 
fixed to some extent, there is the description more than which the circumscription rectangle to which an 
aspect ratio becomes 6 or about 7 times from twice increases. On the other hand, in the case of Japanese, 
the long rectangle which cannot appear easily in English as shovm in the circumscription rectangle 304 
arises, or the description which a fine rectangle produces like the circumscription rectangle 306 is in 
reverse. 

[0032] Then, the above-mentioned connected component rectangle is classified into three kinds, 
"inside", and "merit", and this is totaled about each alphabetic character field. Drawing 4 shows the 
processing flow chart of a Japanese-English judging of an example 1 . Processing of drawing 4 is 
performed for every alphabetic character field, the case where the line writing direction of a rectangular 
classification is width ~ for example, "**", and width of face/height consider [ width of face/height ] as 
"merit" by more than it "inside" or less by two 2 to 6 (step 401). And this classification result in an 
alphabetic character field is totaled (step 402), and Japanese or English is judged for every alphabetic 
character field (step 403). Here, it is carried out as a Japanese-English judgment shows the number of 
"**" rectangles to drawing 8 (detail flowchart of step 403), when the number of NCNT and "long" 
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rectangles is set to LCNT for the number of SCNT and "inside" rectangles. 
[0033] First, it investigates whether LCNT/(NCNT+SCNT) >Thl is realized (step 801). Thl is the 
threshold defined beforehand, for example, carries out to about 0.3. If this conditional expression is 
realized, I will hear that there are fully many long rectangles, and it will be, and will judge with the 
alphabetic character field concerned being a Japanese field (step 804). 

[0034] Next, when judged with No at step 801, it investigates whether NCNT/(LCNT+SCNT) <Th2 is 
realized (step 802). It is the threshold which also defined Th2 beforehand, for example, is referred to as 
3. If this conditional expression is realized, I will hear that there are few inside rectangles, and it v^ll be, 
and v^ll judge with the alphabetic character field concemed being a Japanese field (step 804). When 
fulfilling neither of the conditions, it is judged with an English field (step 803). 
[0035] <Example 2> In the above-mentioned example 1, the judgment Japanese-English in an 
alphabetic character field unit is performed. In this case, there may be very few alphabetic characters 
depending on an alphabetic character field. In such a case, since the number of rectangular is not fiilly 
obtained, it may become difficult to perform a Japanese-English judging by the ratio of the number of 
rectangles. An example 2 is an example in consideration of the case where the number of rectangular is 
not enough. 

[0036] Drawing 5 shows the processing flow chart of an example 2. It investigates whether the 
Japanese-English distinction means 105 has the enough number of the rectangles in the totaled field 
(step 501), and (that is, is it beyond the predetermined threshold Th or not?) in not being enough, it 
performs Japanese-EngUsh distinction using OCR indicated by JP,6-150061,A shown above (step 503). 
In this case, since there are few alphabetic characters, even if it performs OCR processing, there is little 
buildup of the processing time and it ends. And the number of rectangular comes out enough and, in a 
certain case, Japanese-English discernment by the rectangle length which explained in the example 1 is 
performed (step 502). 

[0037] <Example 3> Next, the example 3 which performs Japanese-English discernment per page is 
explained. Drawing 6 and 7 show the detail flowchart of step 205 concerning an example 3. The 
approach shown in drawing 6 totals the number of "long" rectangles about every alphabetic character 
field and a whole page "into" "**" (steps 601 and 602), and carries out a Japanese-English judgment to a 
page unit using the result (step 603). This Japanese-English judgment approach is performed according 
to the processing flow chart of drawing 8 . The thresholds Thl and Th2 at this time are good also as a 
different threshold from the case of processing of an alphabetic character field unit. 
[0038] The approach shovm in drawing 7 performs Japanese-English distinction for every alphabetic 
character field (step 702), and performs the Japanese-English judging of the page concemed based on 
the result (step 703). By setting to En the number of the fields specifically judged in the number of the 
fields judged to be a Japanese field to be Jn and an English field, if it is Jn>En and is a Japanese page 
and En>Jn, it judges with an English page. In Jn=En, it may reject or it may be judged they to be 
[ Japanese-EngUsh any ]. 

[0039] <Example 4> The Japanese-English discernment approach of having used the different 
description firom the above-mentioned example is explained. Drawing 9 shows the configuration of an 
example 4. A different point firom an example 1 is a point of having formed the line logging section 902, 
the block extract section 903, and the alphabetic character kind distinction section 904 within a block. 
The drawing 10 as the thing of an example 1 with other same components shows the processing flow 
chart of an example 4. 

[0040] First, the line logging section 902 starts a line from the alphabetic character field of a document 
image (steps 1001 and 1002). Since line information is acquired in the phase which extracted the field 
when a technique given in JP,6-20092,A is used as field generation processing, the approach using the 
projection indicated by the lECE paper "field division of the document image using circumference 
density distribution, linear density, and the circumscription rectangle description" (August, 1986 besides 
Akiyama, Vol.J69-D No. 8) may be used that what is necessary is just to use this. 
[0041] Next, the block extract section 903 extracts the block of a word (step 1003). As this block extract 
approach, these people should just use the approach previously proposed by Japanese Patent Application 
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No. No. 34781 [ eight to ]. That is, the block extract section 1 1 1 detects the circumscription rectangle 
inside line data, and summarizes the circumscription rectangle to block data. The approach of packing 
into this block data is as follows. Spacing of an alphabetic character rectangle (one rectangle is not 
decided with a single character yet.) Therefore, in the case of the kanji, it asks for a histogram [ that 
what was divided into ** and structure becomes one rectangle in many cases, respectively ]. Drawing 18 
shows the distance between the extracted alphabetic character rectangles. Drawing 19 shows the 
histogram of rectangle spacing. 

[0042] In this histogram, the peak with the shortest distance tends to appear in the distance between 
alphabetic characters in ** of the kanji, spacing of structure, and the same word of the proportional- 
spacing alphabet. Since it is rare for an alphabetic character kind which is different even if it unifies 
these to go into a block, block data is formed by unifying them. The kanji (that is, it consists of ** and 
structure) which the word and single character of a proportional spacing separate v^U be unified by one 
by performing this processing. 

[0043] Moreover, the peak with the longest distance appears in the distance between words, and the 
distance of punctuation and the following alphabetic character in many cases. These want to avoid that it 
is used for a boundary line in case an alphabetic character kind changes in many cases (especially, 
distance between words), and it becomes the same block. Then, the alphabetic character rectangle of the 
distance beyond peak value v^th the longest distance is processed so that it may not be made the same 
block. 

[0044] Furthermore, distance (A, B) with the rectangle of the neighbors of an object rectangle is 
measured, and when the difference (A-B) is beyond a predetermined threshold, the rectangle comrade of 
the distance of the longer one processes so that it may not unify but the rectangle of the distance of the 
shorter one may be unified. Drawing 20 is drawing explaining the case where the difference of spacing 
between rectangles does not unify a rectangle in a large location. In drawing 20 , since a difference does 
not unify a rectangle beyond a predetermined threshold in a large location, three blocks are formed. 
Even if the distance between words is absolutely near in English of a proportional spacing etc. by 
performing such processing, since there must be a difference, only one word can be collectively unified 
with the distance between alphabetic characters. Moreover, even if it is a proportional font, since a 
Japanese kanji part is arranged comparatively at equal intervals, it is convenient [ a part ] also when 
summarizing a Japanese sentence. 

[0045] Since it is separated [ in the case of English ] by the tooth space of half width between words by 
using the above-mentioned block extract approach unlike a Japanese document, mixing with other 
alphabetic character kinds and becoming block data is avoided. 

[0046] Then, the alphabetic character kind distinction section 904 within a block performs Japanese- 
English distinction for every block (step 1004). What is necessary is just to use the approach of the 
application which also showed this above. That is, the alphabetic character kind distinction section 904 
within a block judges the alphabetic character kind whether the settlement blocked by the above- 
mentioned processing is Japanese, or to be an alphabetic character. The inside of a block is judged as 
same alphabetic character kind. This alphabetic character kind of judgment is performed as follows. 
That is, to the width of face of the rectangle v^thin a block, when the number or the count of tone 
reversal of a vertical black run of this rectangle is beyond a predetermined threshold, it discriminates 
from a Japanese alphabetic character, and the alphabet is identified based on the perpendicular direction 
coordinate value of the rectangle within the extracted block. Drawing 21 (a) and (b) show the example 
of the number of the perpendicular direction runs in the case of Japanese and the alphabet. In an 
alphabetic character, when [ ideal ] there is no noise, four runs can be performed in the alphabetic 
character of "g" at the maximum ( drawing 21 (b)). Therefore, when five or more runs count, it 
considers as Japanese. In the case of the alphabetic character **an image" shovra in drawing 21 (a), the 
number of vertical runs changes, as the figure under an alphabetic character shows. 
[0047] The Japanese-English distinction means 905 totals the distinction result for every block, and 
performs Japanese-English distinction of the field concerned (step 1005). Here, the number of ECNT 
and the blocks judged that are unfixed is set to NCNT for the number of the blocks judged in the number 
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of the blocks judged to be Japanese to be JCNT and English. Drawing 1 1 R> 1 is the flow chart of the 
detail of step 1005. It judges with Japanese at the time of JCNT*Th3>ENCT (steps 1 101 and 1105), and 
that is not right and it judges with English at the time of ECNT>JCNT (1 102 1 104). When other, it 
considers as rejection (step 1 103). It spreads, and it carries out and a value Th 3 is set to 2. 
[0048] <Example 5> In the above-mentioned example 4, the judgment Japanese-English in an 
alphabetic character field unit is performed. In this case, there may be very few alphabetic characters 
depending on an alphabetic character field. In such a case, since the number of rectangular is not fully 
obtained, it may become difficult to perform a Japanese-English judging by the ratio of the number of 
distinction results of a block. An example 5 is an example when the number of blocks is not enough. 
[0049] Drawing 12 shows the processing flow chart of an example 5. It investigates whether the 
Japanese-English distinction means 105 has the enough number of the blocks in the totaled alphabetic 
character field (step 1201), and (that is, is it beyond the predetermined threshold Th or not?) in not being 
enough, it performs Japanese-Enghsh distinction using OCR indicated by JP,6- 150061, A shown above 
(step 1203). In this case, since there are few alphabetic characters, even if it performs OCR processing, 
tiiere is little buildup of the processing time and it ends. And the number of blocks comes out enough 
and, in a certain case, Japanese-English discernment by the distinction result for every block explained 
in the example 4 is performed (step 1202), 

[0050] <Example 6> An example 6 changes the Japanese-English distinction for every alphabetic 
character field of an example 4 into Japanese-English distinction of a page unit. Drawing 6 and 7 are 
used for the processing flow chart of an example 6. 

[0051] In processing of drawing 6 , the total of JCNT, ECNT, and NCNT is performed about every 
alphabetic character field and a whole page, and a Japanese-English judgment is performed by the art of 
drawing 1 1 which used and mentioned the result above. At this time, Th3 may differ fi-om the case of an 
alphabetic character field unit. 

[0052] In processing of drawing 7 , first, it distinguishes for every alphabetic character field, and the 
Japanese-English judging of the page concerned is performed from the result. By setting to En the 
number of the fields specifically judged in the number of the fields judged to be a Japanese field to be Jn 
and an English field, if it is Jn>En and is a Japanese page and En>Jn, it judges with an English page. In 
Jn=En, it is good also as rejection, and it is good as for Japanese-English either. 
[0053] <Example 7> In the example 7, in case Japanese-English distinction is performed in every 
alphabetic character field and a page unit, Japanese-English distinction processing (step 1301) in which 
rectangle length is used as shown in drawing 13 , and Japanese-English distinction processing (step 

1302) in which the distinction result for every block is used perform Japanese-English distinction, 
respectively. And it distinguishes fi-om each distinction result to Japanese-English eventually (step 

1303) . 

[0054] When both are judged to be Japanese or English, the final result should just judge with Japanese 
or English as it is. When it is judged to be rejection any they are, let the judgment result of the direction 
which is not rejection be the final result. 

[0055] Both judgment result carries out the following judgments [ which ], when one side of the resuh 
does not correspond in English in Japanese as for another side, 

(1) Consider as rejection. 

(2) Compute both reliability and adopt a result with a bigger value. 

As reliability of the distinction approach of using rectangle length, it is LCNT/(NCNT+SCNT) >Thl, for 
example and, in the case of Thl=0.3, is the value (however, an upper limit is set to 1) of LCNT/ 
(NCNT+SCNT) *2.5. 

In the case of Th 2= 3, at NCNT/(LCNTh-SCNT) <Th2, it is the value (however, an upper limit is set to 
1) of (LCNT+SCNT)/NCNT*2.5. 

In the case of Th 2= 3, at NCNT/(LCNT+SCNT) >Th2, it is the value (however, an upper limit is set to 
1) of NCNT/(LCNT+SCNT) *0.33. 
It carries out. 

[0056] As reliability of the distinction approach of using the distinction result for every block, it is 
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JCNT*Th3>ECNT, for example and, in the case of Th 3= 2, is the value (however, an upper limit is set 
to l)of JCTN/(ECNT*3). 

In ECNT>JCNT, it is the value (however, an upper limit is set to 1) of ECNT/JCNT*0.7. 
It carries out. 

[0057] <Example 8> Drawing 14 shows the configuration of an example 8. Moreover, drawing 15 
shows the processing flow chart of an example 8. In this example, about the whole page of the inputted 
document, the Japanese-English distinction section 1412 Japanese-English discernment processing of 
whether the page is Japanese or to be English is performed using the approach of the examples 3 and 6 
mentioned above (steps 1501 and 1502). Based on the distinction result, the selection section 1403 
chooses the English document recognition section 1404 or the Japanese document recognition section 
1405, performs document recognition processing of the selected language (steps 1504 and 1505), and 
outputs the recognition result to the output sections, such as a display, (step 1506). 
[0058] In addition, it may be better to change field division processing, font discernment processing, 
etc., since it differs in the attribute in Japanese and English. Then, the field division processing and font 
discernment processing which described above the document recognition section of this example not 
only in character recognition processing are also included. 

[0059] <Example 9> Drawing 16 shows the configuration of an example 9 and drawing 17 shows the 
processing flow chart of an example 9. A different point from an example 8 is a point of performing 
Japanese-English discernment for every alphabetic character field. Therefore, the field division section 
1602 divides an input-statement document into an alphabetic character field (steps 1701 and 1702). 
Here, in the field division section, the field division approach which can be adapted for both Japanese- 
English is used. After division processing is carried out, the Japanese-English distinction section 1603 
performs Japanese-English discernment processing using the approach of the example 1 mentioned 
above for every alphabetic character field (step 1704). Based on the distinction result, the selection 
section 1604 chooses the English document recognition section 1605 or the Japanese document 
recognition section 1606. Document recognition processing of the selected language is performed (steps 
1705 and 1706), and the recognition result is outputted to the output sections 1607, such as a display, 
(step 1707). In addition, in the document recognition section of an example 9, font discernment 
processing is also performed besides document recognition processing. 

[0060] <Example 10> Each example mentioned above has judged Japanese and English by making a 
black pixel connected component and rectangle length into characteristic quantity. However, as for the 
approach of the judgment approach using a black pixel connected component requiring the processing 
time, and using rectangle length, generating of rejection may become high. In addition, although there is 
also a method of identifying Japanese or English based on the peak location of the fi-equency 
distribution of the relative position within the line of the top chord of a circumscription rectangle and the 
lower side (see JP,7-21817,B), when a document with an inclination is inputted, fi"equency distribution 
changes a lot and there is a trouble that discernment precision will fall. 

[0061] So, in this example, Japanese and English are identified with a sufficient precision at a high 
speed for every field of a docxmient image by identifying Japanese and English using the histogram of 
the height of the circumscription rectangle in a line to line height. And to the field which cannot be 
distinguished by the above-mentioned Japanese-English discernment approach, either, Japanese-English 
discernment is performed using an option. 

[0062] Drawing 22 shows the configuration of an example 10. Moreover, drawing 23 is the processing 
flow chart of the whole example 10. First, the image input means 2201 obtains a docimient image by 
reading a document (step 2301). This image input means is a scanner, facsimile, etc., and even if it 
obtains an image from another device via a network through the data communication means 2207, it is 
good. 

[0063] Next, the field generation means 2202 generates an alphabetic character field (step 2302). What 
is necessary is just to use the approach indicated by JP,6-20092,A as this field generation method. Next, 
the line logging means 2203 starts the line for character recognition from an alphabetic character field. 
That is, it asks for the circumscription rectangle of an alphabetic character, they are unified, and a line is 
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generated (step 2303). The Japanese-English discernment means 2204 performs Japanese-English 
discernment about the generated alphabetic character field (step 2304). 

[0064] Japanese-English discernment is performed as follows. Drawing 27 R> 7 is the flow chart of the 
detail of Japanese-English discernment (step 2304). Drawing 24 shows an example of the 
circumscription rectangle in the started line and a line. First, the frequency distribution of the rate of the 
circumscription rectangle height in a line to line height is computed (steps 2701 and 2702). Line height 
is set to lineheight and rectangle height is set to height. A rate is set to heightrate=height* 100- 
/lineheight. Moreover, in the case of a document with an inclination like drawing 25 , in order that 
precision may improve Japanese-English discernment more, the maximum of the height of the rectangle 
of the line may be used as lineheight instead of line height. That is, about an input-statement document 
with an inclination, Japanese-English discernment is carried out based on the histogram of the rate of 
****** circumscription rectangle height to the maximum height of the rectangle in a line. 
[0065] The number of rectangles in case the above-mentioned rate heightrate is 80 or more is set to lent, 
the number of rectangles in case heightrate is less than [ 70 or more ] 80 is set to ncnt, and the number of 
rectangles in case heightrate is less than [ 40 or more ] 70 is set to sent, lent, ncnt, and sent are calculated 
fi*om all the rectangles in an alphabetic character field. 

[0066] Drawing 26 shows an example of the number of rectangles investigated about the Japanese 
document and the English document. Generally, there is an inclination for lent of Japanese to be large 
and for sent of English to be large. Then, the predetermined thresholds thJ and thE are set up, and it 
judges with Japanese at the time of lcnt/scnt>thJ (step 2703), and judges with English at the time of 
lcnt/scnt<thE (step 2704). When other, it considers as an unknown field (step 2705). 
[0067] Japanese-English discernment can be carried out to the above-mentioned unknown field using a 
statistical method. Drawing 28 is a detailed processing flow chart to an unknown field. For example, the 
description values lent, ncnt, and sent of a Japanese field and an English field are normalized 
beforehand, and it asks for the inverse matrix of the average and covariance matrix about Japanese and 
English, respectively. And the Mahalanobis distance is found about each of Japanese and English using 
the inverse matrix of an average value and a covariance matrix (steps 2801 and 2802). 
[0068] If a predetermined threshold is set to Me and Mj when setting Dj and English Mahalanobis 
distance to De for the Japanese Mahalanobis distance, it will judge with English at the time of 
Dj/De>Me (step 2803), and will judge with Japanese at the time of Dj/De<Mj (step 2804). When 
satisfied with neither of the conditions, it judges with an unknown field (step 2805). In addition, 
Euclidean distance and city block distance with an average value may be used instead of the above- 
mentioned Mahalanobis distance. 

[0069] Japanese-English discernment is performed to the field judged that is still more unknown using 
tiie reliability of English recognition. Drawing 29 is the detailed processing flow chart of step 2805. 
Reliability is computed by English recognition (step 2901). Subsequently, Bad and reliability set the 
number of the word of 0 to Zelo for the number of the word which is not reliability 0 at Good and less 
than 60% about the number of a word with 60% or more of reliability about the computed reliability 
(step 2902). 

[0070] It is Value=Good/(Good+Bad+Zelo) when setting the decision value of Japanese-English 
discernment to Value. 

It carries out (step 2903) and Value is the predetermined threshold th. If eocr is exceeded (step 2904), it 
will judge with English, and if it becomes less than [ it ], it will judge with Japanese. 
[0071] In addition, weighting may be carried out to Zelo. When Zelo is three pieces of Bad, since it is 
Bad=Bad+Zelox3, Value is Value=Good/(Good-hBad). 

A next door and Value are a threshold th. If eocr is exceeded and it will become less than [ English and 
it ], it can also judge with Japanese. Thus, also in a field with few alphabetic characters for a Japanese- 
English discernment judging, since Japanese-English discernment is carried out by the reliability by 
English recognition, Japanese-English discernment of a field unit is performed with a sufficient 
precision. 

[0072] <Example 1 1> This example is an example which generates a circumscription rectangle fi'om the 
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image which reduced the input-statement paintings-and-caUigraphic-works image, perforais suitable 
integration with the generated rectangles, and performs Japanese-English discernment with a more 
sufficient precision using the histogram of the aspect ratio of the rectangle length after integration. 
[0073] Drawing 30 shows the configuration of an example 1 1. Moreover, drawing 31 is the processing 
flow chart of the whole example 1 1 . The document image inputted by the image input means 3001 like 
the above-mentioned example is reduced by the image cutback means 3002 (steps 3101 and 3102). This 
processing carries out OR compression (if 4x4 pixels is reduced to 1 pixel and there is at least one black 
pixel in 16 pixels, let a cutback image be black) for example, of the document image to about 1/4. 
[0074] Next, the field generation means 3003 generates an alphabetic character field (step 3103). What 
is necessary is just to use the approach indicated by JP,6-20092,A as this field generation method. Then, 
the rectangle integrated means 3004 unifies a rectangle so that a Japanese-English property may appear 
well (step 3 104). For example, as shovm in drav^ng 32 , a rectangle is unified when the x-coordinate of 
the rectangles 1 and 2 of near and next doors has the very near vertical coordinate of the y-coordinate 
(lengthwise direction) of rectangles 1 and 2 (for example, when the horizontal distance between 
rectangles is smaller than the distance equivalent to an English tooth space). Moreover, for example, as 
shown in drawing 33 , it is in the physical relationship in which the left-hand side rectangle 1 includes 
the right-hand side rectangle 2 by the y-coordinate, and when the x-coordinate of the rectangles 1 and 2 
of next doors is very near, a rectangle is unified (for example, when the horizontal distance between 
rectangles is smaller than the distance equivalent to an English tooth space). 
[0075] And it divides into four characteristic quantity, a long rectangle, an inside rectangle, a small 
rectangle, and the minimum rectangle, using a rectangle aspect ratio (rectangle length length / rectangle 
length width) ( drawing 34 ). Generally, the rate of Japanese that a long rectangle appears is high, and 
the rate of English that an inside rectangle appears is high. Using the difference in this property, the 
Japanese-English discernment means 3005 creates a discernment judging type, and performs Japanese- 
English discernment (step 3105). Drawing 35 is the flow chart of the detail of Japanese-English 
discernment processing. 
[0076] 

For example, the number sscnt of fields of the minimum rectangle in the number sent field of fields of 
the small rectangle in the number ncnt field of fields of the inside rectangle in the number lent field of 
fields of the long rectangle in a field (there are many cases of a noise) is computed (step 3501). Rate 
ratio l=lcnt/(ncnt+scnt) of the long rectangle in a field is computed (step 3502), and rate ratio2=ncnt/ 
(Icnt+scnt) of the inside rectangle in a field is computed (step 3503). In addition, when computing the 
above-mentioned rate, sscnt was disregarded as a noise. 

[0077] and ratiol — an x-coordinate and ratio2 ~ a y-coordinate - carrying out — incorrect discernment - 
- as much as possible ~ few ~ a day ~ Hideshige - ****-- the part which is is divided into a Japanese 
field, an English field, and a rejection field so that it may be rejecting, for example, — if it becomes 
ratio2/ratiol>thE ~ an English field and a judgment (step 3504) - carrying out - ratio2/ratiol<thJ if it 
becomes ~ a Japanese field - judging (step 3505) ~ the other field - Japanese-English - suppose that it 
is unknown (step 3506). Here, thE and thJ are predetermined thresholds. 

[0078] Japanese-English ~ Japanese-English discernment is carried out like an example 10 to the field 
judged that is unknown using a statistical method. For example, the description values lent, ncnt, and 
sent of a Japanese field and an English field are normalized beforehand, and it asks for the inverse 
matrix of the average and covariance matrix in Japanese and English, respectively. Each Mahalanobis 
distance of Japanese and English is found using the inverse matrix of an average value and a covariance 
matrix. If a predetermined threshold is set to Me and Mj when setting Dj and English Mahalanobis 
distance to De for the Japanese Mahalanobis distance, it will judge with Japanese at the time of English 
and Dj/De<Mj at the time of Dj/De>Me. When fulfilling neither of the conditions, it judges vsdth it being 
unknown. In addition, Euclidean distance and city block distance v^th an average value may be used 
instead of the Mahalanobis distance. 

[0079] <Example 12> This invention is not limited to the above-mentioned example, but software can 
also realize it. When software realizes this invention, as shown in drawing 36 , the computer system 
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which consists of CPU, memory, an indicating equipment, a hard disk, a keyboard, a CD-ROM drive, a 
scanner, etc. is prepared, and the program which realizes Japanese English judging function of this 
invention and document recognizing ability is recorded on the record medium which CD-Rt)M etc. can 
computer read. Moreover, the document image inputted from image input means, such as a scanner, is 
temporarily stored in a hard disk etc. And if this program is started, the document image data saved 
temporarily will be read, Japanese English judging processing and document recognition processing will 
be performed, and the result will be outputted to a display etc. 
[0080] 

[Effect of the Invention] As mentioned above, since two or more judgment approaches are used together 
according to invention of claim 1 and 12 publications as explained, Japanese and English can be 
distinguished to high degree of accuracy. 

[0081] According to invention of claims 2, 3, 6, 7, and 13 and 14 publications, distinction of Japanese 
and English can be performed with a sufficient precision for every alphabetic character field in dl 
document image. 

[0082] According to invention of claims 4, 5, 8, 9, and 13 and 14 publications, distinction of Japanese 
and English can be performed with a sufficient precision per page of a document image. 
[0083] Since suitable document recognition processing is performed to the document image judged to be 
Japanese or English according to invention of claims 10 and 1 1 and 15 publications, a highly precise 
recognition result can be obtained. 

[Translation done.] 
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* NOTICES * 

JPO and NCI PI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] The configuration of the example 1 of this invention is shown. 

[Drawing 2] The processing flow chart of the whole example 1 of this invention is shown. 

[Drawing 3] The circumscription rectangle is indicated to be the example of an image of English and a 

Japanese sentence. 

[Drawing 4] The processing flow chart of a Japanese-English judging of an example 1 is shown. 

[Drawing 5] The processing flow chart of an example 2 is shown. 

[Drawing 6] The 1 st detail flowchart of step 205 conceming an example 3 is shown. 

[Drawing 7] The 2nd detail flowchart of step 205 concerning an example 3 is shown. 

[Drawing 8] The detail flowchart of step 403 is shown. 

[Drawing 9] The configuration of an example 4 is shown. 

[Drawing 10] The processing flow chart of an example 4 is shown. 

[Drawing 11] It is the flow chart of the detail of step 1005. 

[Drawing 12] The processing flow chart of an example 5 is shovra. 

[Drawing 13] The processing flow chart of an example 7 is shown. 

[Drawing 14] The configuration of an example 8 is shown. 

[Drawing 15] The processing flow chart of an example 8 is shown. 

[Drawing 16] The configuration of an example 9 is showii. 

[Drawing 17] The processing flow chart of an example 9 is shown. 

[Drawing 18] The distance between the extracted alphabetic character rectangles is shown. 

[Drawing 19] The histogram of rectangle spacing is shown. 

[Drawing 20] It is drawing explaining the case where the difference of spacing between rectangles does 
not unify a rectangle in a large location. 

[Drawing 21] (a) and (b) show the example of the number of the perpendicular direction runs in the case 
of Japanese and the alphabet. 

[Drawing 22] The configuration of an example 10 is shown. 
[Drawing 23] It is the processing flow chart of the whole example 10. 

[Drawing 24] An example of the circumscription rectangle in the started line and a line is shown. 
[Drawing 25] An example of the circumscription rectangle in a line when the document leans, and a line 
is shown. <BR> [Drawing 26] An example of the number of rectangles investigated about the Japanese 
document and the English document is shown. 

[Drawing 27] It is the detailed processing flow chart of Japanese-English discernment (step 2304). 

[Drawing 28] It is a detailed processing flow chart to an unknown field. 

[Drawing 29] It is the detailed processing flow chart of step 2805. 

[Drawing 30] The configuration of an example 1 1 is shown. 

[Drawing 31] It is the processing flow chart of the whole example 1 1 . 

[Drawing 32] The example which unifies a rectangle is shown. 

[Drawing 33] Other examples which unify a rectangle are shown. 
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[Drawing 34] The rectangle according to which it was classified into four kinds is shown. 

[Drawing 35] It is the detailed processing flow chart of Japanese-English discernment processing of an 

example 1 1 . 

[Drawing 36] The configuration of an example 12 is shown. 
[Description of Notations] 

101 Image Input Means 

102 Image Cutback Means 

103 Connected Component Extract Means 

104 Field Generation Means 

105 Japanese-EngHsh Distinction Means 

106 Control Section 

107 Data Storage Section 

108 Data Communication Way 

109 Data Conmiunication Means 



[Translation done.] 
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